Begin with the g4g example, but match this classes style.

In [27]:
from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
from scipy import stats

xs = [5,7,8,7,2,17,2,9,4,11,12,9,6]
ys = [99,86,87,88,111,86,103,87,94,78,77,85,86]

m, b, _, _, _ = stats.linregress(xs, ys)

plt.scatter(xs, ys)
plt.plot(xs, [m * x + b for x in xs])
plt.show()
No description has been provided for this image

For error, you computed sums of squared errors. Well done. You probably did something like follows:

In [28]:
errors = [(m * xs[i] + b - ys[i]) ** 2 for i in range(len(xs))]
print(errors)
[np.float64(21.626948352384407), np.float64(23.49288828029534), np.float64(4.39178485240792), np.float64(8.1051031441658), np.float64(129.88283706421814), np.float64(160.4258038281837), np.float64(11.536994532945045), np.float64(0.11859128985571113), np.float64(4.413400213657534), np.float64(34.12657393735711), np.float64(25.91326891120767), np.float64(5.496074733564338), np.float64(43.53669186049348)]

We can take the sum of squared errors by just adding these all up! Python has a handy built-in function to do that, or you can write your own.

In [29]:
print(sum(errors))
473.0669610007362

Okay, so now how did machine learning work. Well, we picked a random m and b.

In [30]:
import random

I was fairly surprised students used random() without either adding or multiplying a value that is between zero and one.

Afterall, we've been working with functions that take values in those ranges, and make them much closer to our data so we know what they look like.

Pick a random point, find the slope and intercept of the line to the next point, and that is the first guess.

In [31]:
pt = int(random.random() * len(xs))

# I had Gemini do this.

m = (ys[pt] - ys[pt - 1]) / (xs[pt] - xs[pt - 1])
b = ys[pt] - m * xs[pt]
pt, m, b
Out[31]:
(4, -4.6, 120.2)

Actually, forget machine learning, let's think about this - what is the largest and smallest possible m and b values under this model.

In [32]:
# you don't need to know how to do write this, but we had a lecture on how you could understand it.
ms = [(ys[pt1] - ys[pt2]) / (xs[pt1] - xs[pt2]) for pt2 in range(len(xs)) for pt1 in range(len(xs)) if xs[pt1] != xs[pt2]]
print(max(ms), min(ms))
5.0 -13.0
In [33]:
bs = [ys[i//len(xs)] - ms[i] * xs[i//len(xs)] for i in range(len(ms))]
print(max(bs), min(bs))
233.0 42.0

Well that seems easy enough. Let's simply compute the sum of squared error for all possible pairs of m and b values and plot it. We have the code to take these two values and determine the sum of square errors from last class from Gemini, or we can write it ourselves.

In [34]:
sse = lambda m, b : sum([(m * xs[i] + b - ys[i]) ** 2 for i in range(len(xs))]) # S um of S quare E rror
print(sse(5,223), sse(-13,42))
391505 306964

I want to note something I do here - I'm not making a list! I'm making a list of lists. This list of lists is a LOT like an image. In fact, I'm going to make it into an image now that I've said that...

In [35]:
sses = [[sse(ms[i], bs[j]) for i in range(len(ms))] for j in range(len(bs))]

I wonder what minimum and maximum sse is. Little bit annoying with the two dimensions.

In [36]:
print(max([max(i) for i in sses]), min([min(i) for i in sses]))
437345.0 473.09566326530614

Oh I could look at it as a dataframe or even as an image!

In [37]:
import pandas as pd
import numpy as py
from PIL import Image as im

df = pd.DataFrame(sses)
df.head()
Out[37]:
0 1 2 3 4 5 6 7 8 9 ... 140 141 142 143 144 145 146 147 148 149
0 5972.500000 3181.250000 3411.500000 3181.250000 15138.854167 13471.916667 5435.250000 92803.250000 4067.500000 4995.331633 ... 28491.500000 45595.250000 5151.687500 23753.250000 2918.687500 26858.583333 3181.250000 11826.530000 12427.500000 20861.916667
1 10528.750000 1550.000000 5492.750000 1550.000000 6288.854167 5240.666667 1329.000000 68897.000000 1198.750000 1242.653061 ... 15722.750000 29114.000000 9089.187500 12222.000000 1906.187500 14502.333333 1550.000000 4255.280000 4608.750000 10155.666667
2 7307.500000 2041.250000 3756.500000 2041.250000 11111.354167 9691.916667 3305.250000 82753.250000 2432.500000 3006.760204 ... 22896.500000 38515.250000 6239.187500 18653.250000 2026.187500 21428.583333 2041.250000 8310.530000 8812.500000 16091.916667
3 10528.750000 1550.000000 5492.750000 1550.000000 6288.854167 5240.666667 1329.000000 68897.000000 1198.750000 1242.653061 ... 15722.750000 29114.000000 9089.187500 12222.000000 1906.187500 14502.333333 1550.000000 4255.280000 4608.750000 10155.666667
4 20978.923611 4781.423611 13055.423611 4781.423611 1098.402778 772.090278 1672.923611 46140.923611 2986.423611 1999.076672 ... 5960.423611 15020.423611 18817.486111 3903.423611 5859.486111 5221.256944 4781.423611 556.703611 621.423611 2799.590278

5 rows × 150 columns

In [38]:
im.fromarray(np.array(sses).astype(np.uint8)).show()
No description has been provided for this image

These all look bad because we aren't forcing the error values to be within the typical ranges of colors. Let's do that.

Colors should be between 0 and 255.

Errors range from 529 to 112k.

We divide by 112k and multiple by 255, more or less.

In [39]:
im.fromarray(np.array([[i//(437345//250) for i in j] for j in sses]).astype(np.uint8)).show()
No description has been provided for this image

We can't see anything because we shouldn't actually be using linear scaling (most likely). We need something better, like a square root.

In [40]:
437345 ** .5
Out[40]:
661.3206483998515

Not bad. We squareroot and divide by 3.

In [41]:
im.fromarray(np.array([[(i**.5)//3 for i in j] for j in sses]).astype(np.uint8)).show()
No description has been provided for this image

Now that is getting somewhere. Still hard to see though. I wonder if plotly can solve this problem for us.

In [42]:
# prompt: plotly 3d surface plot of sses

import plotly.graph_objects as go
import plotly.io as pio
pio.renderers.default='notebook'


# Create the surface plot
fig = go.Figure(data=[go.Surface(z=sses)])

# Customize the plot
fig.update_layout(title='Sum of Squared Errors',
                  scene=dict(
                      xaxis_title='m',
                      yaxis_title='b',
                      zaxis_title='SSE'
                  ))

# Display the plot
fig.show()

Going to smooth it out a little.

In [43]:
go.Figure(data=[go.Surface(z=[[(i**.5)//3 for i in j] for j in sses])]).show()

Our job as machine learning people is to find where the error is lowest. We find it by picking a point, and finding which direction we move to get somewhere lower.

This is called gradient descent, and is probably the most powerful computation technique known at this point in history. It is much faster than computing every error, as we have done today, especially on meaningfully large data sets, but this is a good way to visualize how it works.